A method for the online construction of the set of states of a Markov Decision Process using Answer Set Programming

نویسندگان

  • Leonardo Anjoletto Ferreira
  • Reinaldo A. C. Bianchi
  • Paulo E. Santos
  • Ramon López de Mántaras
چکیده

Non-stationary domains, that change in unpredicted ways, are a challenge for agents searching for optimal policies in sequential decision-making problems. This paper presents a combination of Markov Decision Processes (MDP) with Answer Set Programming (ASP), named Online ASP for MDP (oASP(MDP)), which is a method capable of constructing the set of domain states while the agent interacts with a changing environment. oASP(MDP) updates previously obtained policies, learnt by means of Reinforcement Learning (RL), using rules that represent the domain changes observed by the agent. These rules represent a set of domain constraints that are processed as ASP programs reducing the search space. Results show that oASP(MDP) is capable of finding solutions for problems in non-stationary domains without interfering with the action-value function approximation process.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modelling and Decision-making on Deteriorating Production Systems using Stochastic Dynamic Programming Approach

This study aimed at presenting a method for formulating optimal production, repair and replacement policies. The system was based on the production rate of defective parts and machine repairs and then was set up to optimize maintenance activities and related costs. The machine is either repaired or replaced. The machine is changed completely in the replacement process, but the productio...

متن کامل

ارائه روشی پویا جهت پاسخ به پرس‌وجوهای پیوسته تجمّعی اقتضایی

Data Streams are infinite, fast, time-stamp data elements which are received explosively. Generally, these elements need to be processed in an online, real-time way. So, algorithms to process data streams and answer queries on these streams are mostly one-pass. The execution of such algorithms has some challenges such as memory limitation, scheduling, and accuracy of answers. They will be more ...

متن کامل

An Adaptive Approach to Increase Accuracy of Forward Algorithm for Solving Evaluation Problems on Unstable Statistical Data Set

Nowadays, Hidden Markov models are extensively utilized for modeling stochastic processes. These models help researchers establish and implement the desired theoretical foundations using Markov algorithms such as Forward one. however, Using Stability hypothesis and the mean statistic for determining the values of Markov functions on unstable statistical data set has led to a significant reducti...

متن کامل

presentation of a two stages method to determine the suitable benchmark and return to scale (case study: girls high school of one zone shiraz city)

In this paper, a two stages method to determine suitable benchmark and return scale of the decision making units set is presented. At first, all of the efficient reference set in no radial data envelopment analysis (DEA) based on linear programming is found. first, RAM model is introduced and units is investigated using this model, then, to run the given algorithm below steps is performed. At t...

متن کامل

Using Hybrid Fuzzy PROMETHEE II and Fuzzy Binary Goal Programming for Risk Ranking: A Case Study of Highway Construction Projects

Multi attribute decision making methods are considered as one of the most useful methods for solving ranking problems. In some decision making problems, while the alternatives for corresponding criteria are compared in a pairwise comparison manner, if the criteria are inherently fuzzy, debates will arise in ranking alternatives due to the closeness of the values of the criteria. In this researc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1706.01417  شماره 

صفحات  -

تاریخ انتشار 2017